268 research outputs found

    On the Power of Adaptivity in Sparse Recovery

    Get PDF
    The goal of (stable) sparse recovery is to recover a kk-sparse approximation xx* of a vector xx from linear measurements of xx. Specifically, the goal is to recover xx* such that ||x-x*||_p <= C min_{k-sparse x'} ||x-x'||_q for some constant CC and norm parameters pp and qq. It is known that, for p=q=1p=q=1 or p=q=2p=q=2, this task can be accomplished using m=O(klog(n/k))m=O(k \log (n/k)) non-adaptive measurements [CRT06] and that this bound is tight [DIPW10,FPRU10,PW11]. In this paper we show that if one is allowed to perform measurements that are adaptive, then the number of measurements can be considerably reduced. Specifically, for C=1+epsC=1+eps and p=q=2p=q=2 we show - A scheme with m=O((1/eps)kloglog(neps/k))m=O((1/eps)k log log (n eps/k)) measurements that uses O(logkloglog(neps/k))O(log* k \log \log (n eps/k)) rounds. This is a significant improvement over the best possible non-adaptive bound. - A scheme with m=O((1/eps)klog(k/eps)+klog(n/k))m=O((1/eps) k log (k/eps) + k \log (n/k)) measurements that uses /two/ rounds. This improves over the best possible non-adaptive bound. To the best of our knowledge, these are the first results of this type. As an independent application, we show how to solve the problem of finding a duplicate in a data stream of nn items drawn from 1,2,...,n1{1, 2, ..., n-1} using O(logn)O(log n) bits of space and O(loglogn)O(log log n) passes, improving over the best possible space complexity achievable using a single pass.Comment: 18 pages; appearing at FOCS 201

    Lower Bounds for Sparse Recovery

    Get PDF
    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /randomized/ version of the problem, where A is a random variable and the recovery algorithm is required to work for any fixed x with constant probability (over A).Comment: 11 pages. Appeared at SODA 201

    Stream Sampling for Frequency Cap Statistics

    Full text link
    Unaggregated data, in streamed or distributed form, is prevalent and come from diverse application domains which include interactions of users with web services and IP traffic. Data elements have {\em keys} (cookies, users, queries) and elements with different keys interleave. Analytics on such data typically utilizes statistics stated in terms of the frequencies of keys. The two most common statistics are {\em distinct}, which is the number of active keys in a specified segment, and {\em sum}, which is the sum of the frequencies of keys in the segment. Both are special cases of {\em cap} statistics, defined as the sum of frequencies {\em capped} by a parameter TT, which are popular in online advertising platforms. Aggregation by key, however, is costly, requiring state proportional to the number of distinct keys, and therefore we are interested in estimating these statistics or more generally, sampling the data, without aggregation. We present a sampling framework for unaggregated data that uses a single pass (for streams) or two passes (for distributed data) and state proportional to the desired sample size. Our design provides the first effective solution for general frequency cap statistics. Our \ell-capped samples provide estimates with tight statistical guarantees for cap statistics with T=Θ()T=\Theta(\ell) and nonnegative unbiased estimates of {\em any} monotone non-decreasing frequency statistics. An added benefit of our unified design is facilitating {\em multi-objective samples}, which provide estimates with statistical guarantees for a specified set of different statistics, using a single, smaller sample.Comment: 21 pages, 4 figures, preliminary version will appear in KDD 201

    External inverse pattern matching

    Get PDF
    We consider {\sl external inverse pattern matching} problem. Given a text \t of length nn over an ordered alphabet Σ\Sigma, such that Σ=σ|\Sigma|=\sigma, and a number mnm\le n. The entire problem is to find a pattern \pe\in \Sigma^m which is not a subword of \t and which maximizes the sum of Hamming distances between \pe and all subwords of \t of length mm. We present optimal O(nlogσ)O(n\log\sigma)-time algorithm for the external inverse pattern matching problem which substantially improves the only known polynomial O(nmlogσ)O(nm\log\sigma)-time algorithm introduced by Amir, Apostolico and Lewenstein. Moreover we discuss a fast parallel implementation of our algorithm on the CREW PRAM model

    Cross-Sender Bit-Mixing Coding

    Full text link
    Scheduling to avoid packet collisions is a long-standing challenge in networking, and has become even trickier in wireless networks with multiple senders and multiple receivers. In fact, researchers have proved that even {\em perfect} scheduling can only achieve R=O(1lnN)\mathbf{R} = O(\frac{1}{\ln N}). Here NN is the number of nodes in the network, and R\mathbf{R} is the {\em medium utilization rate}. Ideally, one would hope to achieve R=Θ(1)\mathbf{R} = \Theta(1), while avoiding all the complexities in scheduling. To this end, this paper proposes {\em cross-sender bit-mixing coding} ({\em BMC}), which does not rely on scheduling. Instead, users transmit simultaneously on suitably-chosen slots, and the amount of overlap in different user's slots is controlled via coding. We prove that in all possible network topologies, using BMC enables us to achieve R=Θ(1)\mathbf{R}=\Theta(1). We also prove that the space and time complexities of BMC encoding/decoding are all low-order polynomials.Comment: Published in the International Conference on Information Processing in Sensor Networks (IPSN), 201

    Deterministic Sampling and Range Counting in Geometric Data Streams

    Get PDF
    We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

    Interval Selection in the Streaming Model

    Full text link
    A set of intervals is independent when the intervals are pairwise disjoint. In the interval selection problem we are given a set I\mathbb{I} of intervals and we want to find an independent subset of intervals of largest cardinality. Let α(I)\alpha(\mathbb{I}) denote the cardinality of an optimal solution. We discuss the estimation of α(I)\alpha(\mathbb{I}) in the streaming model, where we only have one-time, sequential access to the input intervals, the endpoints of the intervals lie in {1,...,n}\{1,...,n \}, and the amount of the memory is constrained. For intervals of different sizes, we provide an algorithm in the data stream model that computes an estimate α^\hat\alpha of α(I)\alpha(\mathbb{I}) that, with probability at least 2/32/3, satisfies 12(1ε)α(I)α^α(I)\tfrac 12(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I}). For same-length intervals, we provide another algorithm in the data stream model that computes an estimate α^\hat\alpha of α(I)\alpha(\mathbb{I}) that, with probability at least 2/32/3, satisfies 23(1ε)α(I)α^α(I)\tfrac 23(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I}). The space used by our algorithms is bounded by a polynomial in ε1\varepsilon^{-1} and logn\log n. We also show that no better estimations can be achieved using o(n)o(n) bits of storage. We also develop new, approximate solutions to the interval selection problem, where we want to report a feasible solution, that use O(α(I))O(\alpha(\mathbb{I})) space. Our algorithms for the interval selection problem match the optimal results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval Selection, ICALP 2012], but are much simpler.Comment: Minor correction

    Lower bounds for sparse recovery

    Get PDF
    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover ^x satisfying x|| ^x||1 [less than or equal to] C min[subscript k]-sparse x'||x - x'||1. It is known that there exist matrices A with this property that have only O(k log(n=k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general random- ized version of the problem, where A is a random variable, and the recovery algorithm is required to work for any fixed x with constant probability (over A).David & Lucile Packard FoundationDanish National Research FoundationDanish National Research Foundation (MADALGO (Center for Massive Data Algorithmics))National Science Foundation (U.S.) (grant CCF-0728645)Cisco Community Fellowship Progra

    Pseudorandomness for Regular Branching Programs via Fourier Analysis

    Full text link
    We present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is O(log2n)O(\log^2 n), where nn is the length of the branching program. The previous best seed length known for this model was n1/2+o(1)n^{1/2+o(1)}, which follows as a special case of a generator due to Impagliazzo, Meka, and Zuckerman (FOCS 2012) (which gives a seed length of s1/2+o(1)s^{1/2+o(1)} for arbitrary branching programs of size ss). Our techniques also give seed length n1/2+o(1)n^{1/2+o(1)} for general oblivious, read-once branching programs of width 2no(1)2^{n^{o(1)}}, which is incomparable to the results of Impagliazzo et al.Our pseudorandom generator is similar to the one used by Gopalan et al. (FOCS 2012) for read-once CNFs, but the analysis is quite different; ours is based on Fourier analysis of branching programs. In particular, we show that an oblivious, read-once, regular branching program of width ww has Fourier mass at most (2w2)k(2w^2)^k at level kk, independent of the length of the program.Comment: RANDOM 201
    corecore